Cocaine Noodles: Exploiting the Gap between Human and Machine Speech Recognition
نویسندگان
چکیده
Hands-free, voice-driven user input is gaining popularity, in part due to the increasing functionalities provided by intelligent digital assistances such as Siri, Cortana, and Google Now, and in part due to the proliferation of small devices that do not support more traditional, keyboardbased input. In this paper, we examine the gap in the mechanisms of speech recognition between human and machine. In particular, we ask the question, do the differences in how humans and machines understand spoken speech lead to exploitable vulnerabilities? We find, perhaps surprisingly, that these differences can be easily exploited by an adversary to produce sound which is intelligible as a command to a computer speech recognition system but is not easily understandable by humans. We discuss how a wide range of devices are vulnerable to such manipulation and describe how an attacker might use them to defraud victims or install malware, among other attacks.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملP65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS
People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل